Next: Terminal I/O Encoding, Previous: Specifying Coding Systems, Up: Coding Systems [Contents][Index]
All the operations that transfer text in and out of Emacs have the ability to use a coding system to encode or decode the text. You can also explicitly encode and decode text using the functions in this section.
The result of encoding, and the input to decoding, are not ordinary text. They logically consist of a series of byte values; that is, a series of ASCII and eight-bit characters. In unibyte buffers and strings, these characters have codes in the range 0 through #xFF (255). In a multibyte buffer or string, eight-bit characters have character codes higher than #xFF (see Text Representations), but Emacs transparently converts them to their single-byte values when you encode or decode such text.
The usual way to read a file into a buffer as a sequence of
bytes, so you can decode the contents explicitly, is with
insert-file-contents-literally (see Reading from
Files); alternatively, specify a non-nil
rawfile argument when visiting a file with
find-file-noselect. These methods result in a
unibyte buffer.
The usual way to use the byte sequence that results from
explicitly encoding text is to copy it to a file or
process—for example, to write it with
write-region (see Writing to Files),
and suppress encoding by binding
coding-system-for-write to
no-conversion.
Here are the functions to perform explicit encoding or
decoding. The encoding functions produce sequences of bytes; the
decoding functions are meant to operate on sequences of bytes.
All of these functions discard text properties. They also set
last-coding-system-used to the precise coding system
they used.
This command encodes the text from start to
end according to coding system
coding-system. Normally, the encoded text replaces
the original text in the buffer, but the optional argument
destination can change that. If
destination is a buffer, the encoded text is
inserted in that buffer after point (point does not move); if
it is t, the command returns the encoded text as
a unibyte string without inserting it.
If encoded text is inserted in some buffer, this command returns the length of the encoded text.
The result of encoding is logically a sequence of bytes, but the buffer remains multibyte if it was multibyte before, and any 8-bit bytes are converted to their multibyte representation (see Text Representations).
Do not use undecided for
coding-system when encoding text, since that may
lead to unexpected results. Instead, use
select-safe-coding-system (see
select-safe-coding-system) to suggest a suitable
encoding, if there’s no obvious pertinent value for
coding-system.
This function encodes the text in string
according to coding system coding-system. It
returns a new string containing the encoded text, except when
nocopy is non-nil, in which case the
function may return string itself if the encoding
operation is trivial. The result of encoding is a unibyte
string.
This command decodes the text from start to
end according to coding system
coding-system. To make explicit decoding useful,
the text before decoding ought to be a sequence of byte
values, but both multibyte and unibyte buffers are acceptable
(in the multibyte case, the raw byte values should be
represented as eight-bit characters). Normally, the decoded
text replaces the original text in the buffer, but the
optional argument destination can change that. If
destination is a buffer, the decoded text is
inserted in that buffer after point (point does not move); if
it is t, the command returns the decoded text as
a multibyte string without inserting it.
If decoded text is inserted in some buffer, this command returns the length of the decoded text.
This command puts a charset text property on
the decoded text. The value of the property states the
character set used to decode the original text.
This function decodes the text in string
according to coding-system. It returns a new
string containing the decoded text, except when
nocopy is non-nil, in which case the
function may return string itself if the decoding
operation is trivial. To make explicit decoding useful, the
contents of string ought to be a unibyte string
with a sequence of byte values, but a multibyte string is
also acceptable (assuming it contains 8-bit bytes in their
multibyte form).
If optional argument buffer specifies a buffer, the decoded text is inserted in that buffer after point (point does not move). In this case, the return value is the length of the decoded text.
This function puts a charset text property on
the decoded text. The value of the property states the
character set used to decode the original text:
(decode-coding-string "Gr\374ss Gott" 'latin-1)
⇒ #("Grüss Gott" 0 9 (charset iso-8859-1))
This function decodes the text from from to
to as if it were being read from file
filename using insert-file-contents
using the rest of the arguments provided.
The normal way to use this function is after reading text from a file without decoding, if you decide you would rather have decoded it. Instead of deleting the text and reading it again, this time with decoding, you can call this function.
Next: Terminal I/O Encoding, Previous: Specifying Coding Systems, Up: Coding Systems [Contents][Index]